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DECOMPOSITION OF THE KANTOROVICH PROBLEM AND 
WASSERSTEIN DISTANCES ON SIMPLEXES 

DANILA ZAEV 


Abstract. Let X be a Polish space, V(X) be the set of Borel probability measures on X , 
and T: X -A X be a homeomorphism. We prove that for the simplex Dom C V(X) of all T- 
invariant measures, the Kantorovich metric on Dom can be reconstructed from its values on the 
set of extreme points. This fact is closely related to the following result: the invariant optimal 
transportation plan is a mixture of invariant optimal transportation plans between extreme 
points of the simplex. The latter result can be generalized to the case of the Kantorovich 
problem with additional linear constraints and the class of ergodic decomposable simplexes. 
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1. Introduction 

In this paper we study an important modification of the Kantorovich problem: the problem 
of mass transportation with additional linear constraints. The recent developments about 
general theory of optimal transport are described in m, m, m, m- 

Transport problem with linear constraints appears naturally in several applications, in par¬ 
ticular, in financial mathematics (martingale problem) and in infinite-dimensional analysis 
(invariant transport problem w.r.t. action of some group), see [4], [16], [18"], [27] . 

Many fundamental problems of mass transport theory (e.g., the problem of existence of a 
transport map) for measures on infinite-dimensional spaces lead us to study transportation 
of ergodic measures and the relations between the structure of optimal transport plans and 
ergodic decompositions (see |T6| for details). 


Date : November 5, 2015. 

Key words and phrases. Kantorovich problem, Wasserstein distance, ergodic decomposition, Choquet sim¬ 
plex, invariant measure, disintegration of measures, sufficient statistic, Markov kernel. 

1 







Section 1. Introduction 


In this paper we provide an answer for the question: when is it possible to ergodically 
decompose an optimal transport plan into optimal plans between decompositions of marginals? 

Let us start with an example. Let G be a nice enough group (locally-compact, amenable), 
acting continuously on a compact metric space ( X , d ). Its action on X x X is defined in the 
diagonal way: g(x 1,2:2) := (g(xi),g(x2)), where g is the action of an element g E G on X. 
Measure g on (X , d) is called invariant w.r.t. G iff g o g~ l = g for every g £ G. Denote by 
'Pq(X) C V{X) the set of all invariant Borel probability measures on (X,d), equipped with 
the topology of weak convergence and the corresponding Borel u-algebra. It is well-known, 
that Vg(X) is a compact convex non-empty set. As any other convex set, it contains a subset 
of all extreme points, i.e. such points, that are midpoints of no line segment with endpoints in 
Vg(X). Let us denote this set via d e (V<G(X)) and call it the boundary of Vg{X). It is clear, 
that extreme points of V&(X) are exactly G-ergodic measures. It is also known (see [2Tj for 
a proof), that V&(X) is a simplex. It means that each point of Vg(X) can be represented in 
the unique way as a barycenter of a probability Borel measure concentrated on the boundary 

d e (v G (x)). 

Assume that g, v are some given measures from Yg{X). Consider the following optimization 
problem, which is usually called Kantorovich problem m , m, my 


inf 


cdir : 7r G V(X x X), Pri(7r) 


Ai,Pr 2 (vr) 



Here c : X x X —> M is some continuous bounded below function, Prfc(7r) is the fe-th marginal 
of measure 7 r. One can think about this problem as about a problem of finding an optimal way 
to transfer one mass distribution (measure g) to the other one (measure v) with respect to the 
transfer cost c. Measures ir are called transport plans, and can be thought of as generalized 
maps from one measure space to the other. 

It is quite natural to consider in this situation not the set of all transport plans, but the set 
of invariant ones w.r.t. the action of G on X x Y. Then the corresponding the Kantorovich 
problem has the form: 


inf 


cdi r: 7 r € Vg(X x X),Pri(7r) 


Ah Pr 2 (7r) = v 


This is an example of Kantorovich problem with additional linear restrictions. This modifica¬ 
tion of Kantorovich problem was independently formulated and studied in [Jj and [27]. In our 
case an additional restriction is the restriction of invariance. 

Since g and v are elements of simplex Vg(X), they can be uniquely represented as follows 


V = 


L 


de(V G (X)) 


£, a dg(a) =: bar (/I), v = 


L 


de(V G (X)) 


£pdv(P) =: bar(z>) 


i.e. as barycenters of some probability measures concentrated on the boundary of Vg(X): 
g, v £ V(d e (VG(X))). It is natural to ask, whether it is possible to decompose Kantorovich 
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problem in the following way: 


(1.1) inf < / cdr r: n £ V<qG(X X X ), Pi‘i(7r) = / u,Pr 2 (vr) = u > = 


= inf < / inf < / ccRr: 7r € Vq(X x X), Pri(7r) = ^ Q ,,Pr 2 (7r) =£ t p'> d%\ n £ II (ft, is) 


where Ll(/I, is) := {it £ V(d e (V&(X)) x <9 e (PG(X))) : Pi’i(tt) = p, Pr 2 (7r) = is}. In other words, 
is it possible to split Kantorovich problem into two sub-problems: the first one is to calculate 
the minimal cost of transportation between any pair of measures from the boundary of a 
simplex, and the second one is to find an optimal way to transport measures on the boundary, 
i.e. measures on the set of ergodic measures? In this paper we describe sufficient conditions 
for existence of such a decomposition. It appears, that this decomposition result holds for a 
special kind of simplexes Dom C V(X), known as ergodic decomposable simplexes (|11|. [IT]). 
The simplex of invariant measures, described above, is an example of an ergodic decomposable 
simplex. In particular, the following statement is valid. 

Theorem 1 . 1 . Let (X,A), (Y. B) be two polish spaces with Borel u-algebras with given 
continuous actions of the group (Z, +), A inv C A, B inv C B be corresponding cr-subalgebras 
of invariant sets, c: X x Y —> R be some lower semicontinuous and bounded below function. 
Then the equality (jl.lD is valid, where ft is the restriction of p on A mv , and v is the restriction 
of is on B mv . 


This theorem is a particular case of the main theorem of the paper (Theorem 15.51) . For 
the proof let us consider ergodic decomposition of an optimal invariant transport plan. It is 
not hard to check, that this decomposition consists of ergodic transport plans with ergodic 
marginals. It is required to show that almost all plans in this decompositions are optimal. If 
it is not true, one can replace non-optimal components with the optimal ones and obtain a 
contradiction. Meanwhile, there is a technical difficulty here: making such a replacement one 
need to be careful with measurability. The family of transport plans consisting the decom¬ 
position should remain to be measurably dependent on marginals. This issue is discussed in 
details in Section 4 of the paper. 

Let Dom C V{X) be a simplex that admits decomposition of a Kantorovich problem. Denote 
for shortness E := (9 e (Dom) C Dom. Assume that a distance function d : E x E —> R>o is 
defined on E, and that d metricizes the topology of E. If we fix some p £ [1, oo), we can extend 
d to a distance function d p on Dom via the formula: 

(1.2) dp{p, v) := inf j d p (ei, e 2 )d7r^ : 7T € V(E X E), Pri(7r) = ft, Pr 2 (7r) = i>\ , 

where V(E x E) is the set of all Borel probability measures on E x E, Pr*, is an operator 
sending a measure from V(E x E) to its fc-tli marginal, bar (jx) = p, bar(P) = v. 

Let us recall the definition of the L p -Wasserstein distance in the case the measures are 
defined on a compact metric space. 

Definition 1 . 2 . Let (X,d) be a compact metric space, V(X) be a set of all Borel (= Radon) 
probability measures on it. Then for any p £ [l,oo) L p -Wasserstein distance W p : V(X) x 
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V(X) —>■ M>o is a distance function defined by the formula: 

(1.3) W p (n,u) := inf < ( f d p (x,y)dir\ : ir G V(X x X), Pri(7r) = /i, Pi’2(7r) = v > . 


Considering invariant Kantorovich problem, it natural to introduce the following analogue 
L p -Wasserstein distance on Dom := Vq(X): 

(1.4) Wp{n,v) := inf | (^j d p {x,y)di^j : ir G Vg{X X X), Pri(7r) = /i, Pr 2 (7r) = n 

It is known (but it is not a trivial fact), that in the case d is invariant w.r.t. G, function Wp 
coincides with W p on Dom (see Moameni, HU). We do not assume invariance of a distance d, 
thus Wp > W p on their common domain of definition. It can be checked, that W p actually 
satisfies every axiom of a distance function on V<q(X) (see Example 13.141 for the proof). 

Question 1.3. Let for some p G [1, +oo) the distance W!~ is dehned on Dom. Is it possible to 
define a distance d on E = <9 e Dom in such a way, that its extension d p on Dom would coincide 
with W p G ? 

Let be the restriction of the invariant L p -Wasserstein distance (defined via formula (11.41) ) 
to Erg(X) := <9 e (Dom), i.e. 



(1.5) (d^) p (Ca,^/ 3 ) := inf | y d p (^ Q ,^)di r: tt G V g (X x X), Pri(?r) = £ a , Pr 2 (7r) = . 

Then it is true, that 

Wp(^,v) = 

for every /r, v G Dom. Here d G is the extension of d G (11.51) from E := 5 e (Dom) on Dom defined 
by the formula (11.21) . 

This fact is a particular case of Theorem 16.11 It is clear, that it is closely related to the 
decomposition of the Kantorovich problem. Denote via W p (d)) the space of all in¬ 

variant probability measures on a metric compact space ( X , d ), and equip it with the invariant 
L p -Wasserstein distance. Since 

d e (Dom) C Dom = Vq{X), 

one can restrict W G (d) on the subsets Dom and <9 e (Dom) and obtain metric spaces (Dom, Wp(d)) 
and (<9 e (Dom), Wp(d)) respectively. By the definition of simplex, 

Dom ~ V(d e (Dom)), 

where is for homeomorphism. Hence one can construct L p -Wasserstein distance on 
■p(9 e (Dom)) considering (9 e (Dom), Wp(d)) as the original metric structure. Denote the con¬ 
structed metric space via ('P(9 e (Dom)), W p (Wp(d))). Due to the decomposition result, 

(V(d e (Dom)),W p (W^(d))) = (Dom, Wp (d)), 

where “=” is for isometric isomorphism. We shall call results of this type as decompositions 
of a Wasserstein distance. 


4 
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We are going to prove a general result about decompositions of transport plans, such that 
the described decompositions of Wasserstein distances appear to be its particular cases. The 
main theorems of this paper are Theorem 15.51 and Theorem 16.11 The first one establishes a 
decomposition result for a Kantorovich problem, while the second one describes decomposition 
for Wasserstein-like distances on simplexes. The preliminary section of the paper contains all 
required definitions and facts from the theory of simplexes, sufficient statistics, and ergodic 
decompositions of measures. Describing this theory, we mostly follow mi and [113- 

In Section 3 we define the Kantorovich problem with additional restrictions. We also for¬ 
mulate notions of “good” (in some defined sense) linear restrictions: weakly regular , ergodic 
decomposable, coherent and geometric ones. These notions are important for the formulation 
of the main results. 

In Section 4 we provide a proof of a measurable selection statement, based on the results of 
Rieder [223 about measurability of solutions in optimization problems. 

In Section 5 we formulate and prove the main statement about existence of the decomposition 
of a Kantorovich problem in the case an additional restriction is ergodic decomposable, weakly 
regular, and coherent. In Section 6 we prove the result about decomposition of Wasserstein- 
like distances in the case an additional restriction is geometric (in addition to the previous 
assumptions). 

We also describe some possible applications of the obtained results. These applications are 
closely related to the study of symmetric and invariant modifications of Kantorovich problem, 
which are studied, for example, in m, 0 , ns], m- 

2. Preliminaries: simplexes and disintegration of measures 

In this section we briefly discuss definitions and results from the theory of simplexes, ergodic 
decompositions and disintegrations of measures, that will be used in the rest of the paper. 

The standard way to define a notion of infinite-dimensional simplex is the approach of 
Choquet. 

Definition 2.1. A point x € K of a convex compact set K is extreme iff for any two points 
a,i) £ if, such that x = t ■ a + (1 — t) ■ b for some t € (0,1), it follows that a = b. Compact 
convex metrizable set K is a Choquet simplex iff for every element a G K of this set there 
is a unique Borel measure p, a on K such that 

• Ha(d e (K)) = 1, where d e (K) is a subset consisted of all extreme points of K, 

• a = bar (p a ) ■= f K xdp a . 

Affine map T : K\ —> K 2 between two convex sets is a map with the property T(a-o + (l — 
a) ■ b) = a ■ T(a) + (1 — a) ■ T{b) for all a € [0,1], a,b G K\. Two simplexes are isomorphic iff 
there exists an affine homeomorphism between them. There is only one n-dimensional Choquet 
simplex up to isomorphism for any finite n. 

Bauer simplex is a Choquet simplex with closed (hence compact) Choquet boundary. 
There are infinitely many non-isomorphic Bauer simplexes. Poulsen simplex is a Choquet 
simplex with dense Choquet boundary. There is only one (up to isomorphism) Poulsen simplex 
(see m for details). 

Example 2.2. Let (X,d) be a compact metric space, V(X) be the set of all probability 
measures equipped with the weak*-topology, induced by the embedding V(X) C C(X)*. Then 
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V(X) is a Bauer simplex, its Choquet boundary consists of all Dirac measures. Every Bauer 
simplex is isomorphic to V{X) for some compact metrizable space X (see [3]). 

Example 2.3. Let ( X , d) be a compact metric space, G be an amenable group, acting on X 
by homeomorphisms, Vg(X) be the set of all invariant probability measures equipped with 
the weak*-topology, induced by the embedding V(X) C C(X)*. Then Vg{X) is a Choquet 
simplex, its Choquet boundary consists of all ergodic measures. Every Choquet simplex is 
isomorphic to Vg(X) for some compact metrizable X and some continuous action of the group 
Z (see [9], [TO]). 

Example 2.4. Let C be a C*-algebra, S(C) be its state space, i.e. set of all positive contin¬ 
uous linear functionals of norm one, S(C ) be equipped with weak*-topology, induced by the 
embedding S(C ) C C*. Then S(C ) is a compact convex set. It is a Choquet simplex if and only 
if C is a commutative algebra (see 0). 

Example 2.5. Let (X, d) be a Polish metric space, V(X) be the set of all probability measures 
equipped with the weak*-topology, induced by the embedding V(X) C C(X)*. Then V(X) is 
a Choquet (and Bauer) simplex if and only if X is compact. 

It can be noted, that this approach to the notion of simplex has two substantial disadvan¬ 
tages: 

• At first, all Choquet simplexes are assumed to be compact sets by the definition. It 
is convenient in many cases, but spaces of measures, being equipped with Kantorovich 
metric, are not compact in general. 

• The second, and, arguably, more important “contra” for the Choquet theory: it does 
not link explicitly the representation of an element of simplex as a mixture of the 
extreme points with an ergodic decomposition of the measure corresponding to the 
element. 

The approach, introduced by Dynkin in m and developed subsequently by several authors 
in m is free of these disadvantages. The notion of simplex provided by Dynkin is a general¬ 
ization of the one introduced by Choquet. It is formulated in purely measure-theoretic terms 
and does not require any topological assumptions. For a special subclass of Dynkin simplexes 
(that are called ergodic decomposable in [17] ) there is a result that connects representation of a 
measure as a mixture of extremes with its disintegration w.r.t. some appropriate e-subalgebra. 
Conditional measures of this disintegration correspond to the extreme points of a simplex. 

Let us denote via B(X,A) the set of all „4-measurable bounded real-valued functions on X. 

Definition 2.6. Let (X,A) be a measurable space, M C V{X) is a subset of probability 
measures. The smallest e-algebra on M, such that for any / £ B(X,A), /x € M the map 
/x —>• /x(/) is measurable, is called a standard e-algebra. 

Definition 2.7. Let (X, A) be a measurable space, M C V{X) be a subset equipped with a 
standard e-algebra. Let us define barycenter bar: V{M) —>• M by the formula 

bar(£)(/) = (^J fdv\ dfi(is), M f £ B(X,A). 

The boundary of (M, B) is a set d e (M) C M of such points m, that for any measure /x £ V(M) 
the property bar(/x) = m implies /x m ({m}) = 1. A measurable space ( M,B ) is called Dynkin 
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simplex, iff its boundary d e (M) is measurable, and for any y £ M there exists a unique 
y £ V(M) s.t. bar(/i) = y and y(d e (M)) = 1. 

Example 2.8. If M := V(X), then it is a Dynkin simplex with boundary d e (M ) consisted of 
all Dirac measures. 

Example 2.9. Let ( X , A) be a measurable space, G be an amenable group, acting on X by 
measurable transformations, V&(X) is a set of all invariant measures. If M = Va(X), B is 
a cr-algebra generated by evaluations, then (M, B) is a Dynkin simplex, its Dynkin boundary 
consists of all ergodic measures. Since any Choquet simplex is isomorphic to Vg{X) for some 
compact metrizable X and some continuous action of the group Z (see. [9], DU]), every Choquet 
simplex is a Dynkin simplex, up to an isomorphism. 

Let (X, A) be some measurable space. 

Definition 2.10. A function Q : X x A —> [0,1] is called Markov kernel on (A, M) iff for 
each x £ X, Q x := Q(x , •) is a probability measure on A , and for each A £ A, x —> Q(x , A) is 
a ^.-measurable function. 

We call two Markov kernels Q i, Q 2 on (X, A) M-equivalent for some M Cp(I) iff y({x : 
Qi(x, •) = Q 2 {x, •)}) = 1 for all y £ M. 

Definition 2.11. Let (X, A), ( Y,B ) be two measurable spaces. A function Q : Y x A —> [0,1] 
is called Markov transition kernel from (A, A) to ( Y,B ) iff for each y £ Y , Q y := Q(y, •) is 
a probability measure on ( X , A), and for each A £ A, y —>■ Q(y, A) is a immeasurable function. 

We call two Markov transition kernels Q \, Q 2 from (A", A) to ( Y,B ) M-equivalent for some 
M C V(Y) iff p{{y : Q\(y, •) = Q 2 (y , •)}) = 1 f° r S £ M. Each Markov kernel Q defines a 
positive operator on B(X, A) by the formula 

Q(f)(x) := J fdQ(x, ■). 

Definition 2.12. Let M C V(X,A), A 0 be a cr-subalgebra of A. A Markov kernel Q on 
(A, A) is called a decomposition for a triple (M, M) iff 

E^(f\A°) = Q(f ) a.e. x w.r.t. y 

for all / £ B(X,A), y £ M. Here E m (/|M°) is a conditional expectation of / with respect to 
u-algebra M° and measure y. 

Note, that a decomposition for a triple (M, M) is by definition a Markov transition kernel 
from (A, A) to (A, M°). 

Define the action of Markov kernel on V(X, A) as follows: 

Q#(y)(A) := f Q(x, A)dy VA £ A. 

J x 

A measure y £ V is said to be invariant w.r.t. Q iff Q#{y ) = y- 

Analogously, for any Markov transition kernel there is a map from B(X,A) to B(Y,B ): 

Q(f){y ) : = J f{x)dQ(y,-) 
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and a map from V(Y, B) to V(X,A) 

Q#(v)(A) := Q(y,A)du MA £ A. 

Definition 2.13. M C V(X) is separable iff A contains a countable family T of subsets, 
such that for each pair of different measures y\, fi 2 £ M there exists A £ T s.t. yi{A) / ^{A). 

Definition 2.14. A cr-algebra „4° C A is called sufficient for a separable M C V(X), if there 
is a Markov kernel Q on (X, A), such that it is a decomposition for a triple (A,A°,M). 

Definition 2.15. A sufficient cr-algebra is called //-sufficient for M if it is sufficient and 

/x({x : Q x £ AT}) = 1 V/i G M. 

We call two cr-subalgebras A 1 C A, A 2 C A M-equivalent for some M C V(X, A) iff for any 
/x £ M, any A\ € A 1 there exists A 2 £ A 2 s.t. A\ = A 2 a.e. w.r.t. y. 

The following important theorem is the union of the statements of Theorem 3.1, Theorem 
3.2, Theorem 3.3 of HB and Lemma 3.6 of [El- 

Theorem 2.16. Assume that A is a countably generated cr-algebra. Then for a separable set 
M C V(X,A), the following properties are equivalent. 

• There exists a unique, up to M -equivalence, IL-sufficient cr-algebra A 0 C A for M, 
which is the < 7 -algebra of all sets A € A s.t. fa(A) = 0 or n(A) = 1 V/i € M . 

• There exists a Markov kernel Q on ( X , A) with the property: 

Q(gQ(f)) = Q{g)Q{f) g b(x,A), 

or, equivalently, with the property 

Q x ({y £ X :Q X = QV}) = 1 \/x £ X, 

such that M is a set of all Q-invariant elements of V(X,A). 

For a Markov kernel from the proposition, the corresponding //-sufficient algebra is the algebra 
(up to M-equivalence) of all Q-invariant measurable sets. 

Definition 2.17. A separable set M C V(X,A) of measures on a countably generated cr- 
algebra A is called ergodic decomposable simplex iff any of the equivalent properties of 
Theorem 12.161 is satisfied. 

In m Dynkin provide a series of examples of ergodic decomposable simplexes. In particular, 
the simplexes from examples 12.8112.91 are ergodic decomposable. 

The following result is a direct consequence of Theorem 3.1 from mi and Remark 3.8 from 

[El- 

Theorem 2.18. Ergodically decomposable simplex M C V(X,A) is a Dynkin simplex. More¬ 
over, there exists such a Markov kernel Q from the definition of ergodic decomposable simplex, 
that Dynkin boundary is defined as following: 

d e {M) = {Q X :x£X} 

and 

bar(/I) = /x /x(S') = y({x : Q x £ 5}) 
for any measurable S C M. 
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The following statement establishes a coincidence (in the case of a Polish space X) of the 
Borel cr-algebra generated by the topology of weak convergence on V(X), and the standard 
a -algebra on V(X). 

Proposition 2.19. Let (X, A) be a Polish space with Borel cr-algebra. Then the following 
three cx-algebras on V{X) coincide: 

(1) the standard a -algebra £, 

(2) cr-algebra, generated by all sets of the form {/x £ V(X) : a < p(A) < b}, A £ A, 
a, b £ Q n [0,1], 

(3) Borel e-algebra, generated by the topology of weak convergence on V(X). 

Proof. Let us show first, that £ can be generated by all functionals of the form /x —» p(A), A £ 
A. It is obvious, that if all maps /x —» p(A) are measurable, then the map /x —> /x(s) := f sdp is 
measurable for every step-function s (step function is a linear combination of a finite number 
of measurable indicator functions). By definition of Lebesgue integration, p(f) := f fdp := 
sup{/x(s) : 0 < s < f + } — sup{^x(s) : 0 < s < f~} (where / = f + — f~, / + , f~ are non-negative 
and measurable, s is a measurable step-function) and its measurability follows from the classic 
fact, that a pointwise supremum of measurable maps is measurable. 

Since the family of all intervals with rational endpoints in [0,1] generates Borel cr-algebra 
on [0,1], the standard cr-algebra coincides with the a -algebra generated by all sets of the form 
{p £ T*(A) : a ft /x(A) ft 5}, A £ A, ex, b £ Q D [0,1]. 

The equivalence of this a -algebra with the Borel one (w.r.t. topology of weak convergence) 
was proved in |14] (Theorem 2.3). □ 


Let us discuss two important examples of ergodic decomposable simplexes. 


Example 2.20 (Main example). Let G be locally compact amenable group with a fixed 
continuous action on a Polish metric space (X,d). An action of G on X x X is defined in 
the “diagonal” way: g(x 1 , 3 : 2 ) := (g(xi),g(% 2 )), where g is an action of the element g £ G 
on X. A measure p on (A, d) is called invariant w.r.t. G iff /x o g = p for every g £ G. 
Denote via V&{X) C V{X) the set of all Borel invariant probability measures on (X,d). It 
is known that Vq(X) is a closed ergodic decomposable simplex (see Sections 6 and 7 in |1JJ 
for the proof). The corresponding H- sufficient cr-algebra can be defined as an algebra A mv 
of all Borel G-invariant measurable subsets. The corresponding Markov kernel Q on (X,A) is 
defined as 

Q(x,A):= lira 1 f g # (5 x (A))dv{g). 
n-Kx> p(F n ) J Fn 


Here (F n ) is a Folner sequence for G, v is a left Haar measure on G, 5 X is the Dirac measure 
concentrated in x £ X. Recall that a Folner sequence for G is a sequence of nonempty compact 
subsets of G, such that 


lim 

71—>00 


p(gF n A F n ) 

V(Fn) 


= 0 


for each g £ G, where gF n := {gf: f £ F n }, and A is the symmetric difference. 


Example 2.21 (Discrete-time Markov process). Let (X. A) be a Polish space with Borel cr- 
algebra, Q : X x A —> [0,1] be a Markov kernel with the property Q{fQ{g)) = Q{f)Q{g)- 
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The sequence of probability measures / jL n := Q n ( n o)> n G N, is called a discrete-time Markov 
process with the initial distribution fiQ and the transition kernel Q. 

In this case the set of all Q-invariant probability measures on X (stationary distributions) 
Vq(X ) C V(X) is an ergodic decomposable simplex. The simplex is closed if the corresponding 
Markov operator Q : V{X) —>• V(X) is weak*-continuous. Its H- sufficient cr-algebra is equiv¬ 
alent to the algebra Aq of all Borel Q-invariant subsets (i.e. such A G A, that Q(Ia) = Ia, 
where I a is an indicator function of A). See Section 9 of [11J for the generalization of this 
example to a continuous-time case. 

3. Kantorovich problem with additional linear restrictions 

Let (X,A), (Y,B) be two Polish spaces with Borel u-algebras. Denote via V(X), V(Y), 
V(X x Y) the sets of all probability measures on ( X , *4.), (Y. B ) and ( X x Y, A®B) respectively. 
Note that it is possible to consider sets of measures as subsets of Banach dual spaces: V(X) C 
Cb(X)*, V(Y) C Cb(Y)*, V[X x Y) C Cb{X x Y)*. Consider weak*-topologies on dual spaces 
and equip the spaces of probability measures with the topologies induced by the inclusion. It is 
known, that the defined topological spaces V(X), V(Y), V(XxY) are Polish (see [20] , Theorem 
6.2 and 6.5). Moreover, their topology coincides with the topology of weak convergence of 
measures. Due to this fact, the notions of weak and weak*-topology on the spaces of probability 
measures will be used interchangeably. 

Let us define projection operators Prx: V(X x Y) -a- V(X), Pry: V(X x Y) — > V{Y) as 
follows: 

(3.1) Pr x (vr)(4) = 7 t(A x Y), VA G A, 

(3.2) Pr Y (7r)(5) = tt(X x B), VB G B. 

In addition, let us define Pr: V(X x Y) —* V(X) x V(X) as Pr(7r) = (Prx(vr),Prv(7r)). It is 
clear, that the defined operators are weakly continuous. 

In Kantorovich theory we are interested in the following sets of measures: 

n(/i, v) = {n G V(X x Y): Pr(7T) = (/z, i/)}, /j, G V{X), v G V{Y). 

Elements of these sets are called transport plans. Since the map Pr is continuous, the set 
is closed. It is also known to be compact (see [2], [6], (26| ) . 

Let us define cost functional as a functional C : V(X x7)gRU {+oo} that is affine: 

aC( n) + (1 — a)C{ 7 ) = C(aTT + (1 — 0 ) 7 ) 

for all a G [0, 1], 77,7 G V[X x Y), 0 • (+00) := 0. In most cases we are interested in weakly 

lower semi-continuous (l.s.c.) cost functionals. Examples of such cost functionals can be 

provided by integration of lower semi-continuous functions bounded below. If c : X xY —> R 
is such a function, then Cost(i r) := f cdir is a cost functional. Meanwhile, the variety of cost 
functionals is not reduced to the functionals of this form. For example, in [2lJ was introduced 
a more general class of virtually-continuous functions. Integration of bounded below function 
of this class also defines a weakly l.s.c. functional as well. 

Let us fix some cost functional C and define two sets of optimal plans: 

U opt (X x Y) := { 7 r G P(X x Y) : C{ tt) < C( 7 ), V 7 G II(Pr(7r))}, 

n^(/x, v) := { 7 r G 1%, v) n U opt (X x Y)}. 
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In some applications the Kantorovich problem appears in a modified form. For example, 
one may be interested in the search of optimal element not in the set of all transport plans, 
but only among invariant (in some defined sense) ones. Such problem is sometimes called 
invariant or symmetric Kantorovich problem. Another example is martingale Kantorovich 
problem, where optimal solution is seek in the set of plans-martingales. A fruitful way to 
formalize and generalize such modifications is the notion of the Kantorovich problem with 
additional linear restriction, which is the main object of interest in this section. 

Many aspects of the Kantorovich problem in the symmetric context were described in El- 
See also [8j for the results about symmetric Monge-Kantorovich problem in the multi-marginal 
setting, |3] and m for duality and monotonicity results about Kantorovich problem with 
additional linear restriction. Let us start with a formalization of the notion of additional linear 
restriction. 

Definition 3.1. For a given pair of measurable spaces (X : A), ( Y. B) let us call by linear 
restriction a triple R = (f 1, M x , M Y ), consisted of a subset of measurable functions C 
L°(X xY,i(g> B) and two nonempty sets of measures M x C V(X), M y C V(Y). 

Let us define bounded sets of transport plans as follows: 

(3.3) U r (XxY) := |vr G V[X x Y): j udn = 0 Vw G Q, Pr x (vr) G M x , Pr Y (7r) G M y 

(3.4) Ar(p, v) := |tt G II r (X x Y): Pr. Y (vr) = /J, Pry(7r) = i/j. 

Two linear restrictions R±, R 2 are called equivalent iff 11^ (X xY) = Br 2 {X xY). 

Everywhere in this section we assume that both (X,A), (Y,B) are Polish spaces with Borel 
a- algebras. 

Let us formulate a property of a linear restriction, which will be used in the next section to 
obtain an existence result for measurable selection. 

Definition 3.2. Let us call linear restriction R = (S7, M x , M Y ) weakly regular iff 

(1) M x , are closed in the topology of weak convergence, 

(2) the functional tt —> f wdir is weakly continuous on II (M x ,M Y ) for every uj G II, 

(3) the set II/j(/i, v) is nonempty for any n G M x . v G M Y . 

Proposition 3.3. Let R = (Q. M x ,M Y ) be weakly regular linear restriction. Then 11 n(X x Y) 
is closed, and II^(//, v) is compact for any pair of measures (fa,n) G M x x . Both spaces, 
II/j(A x Y) and appear to be Polish. 

Proof. It follows form the fact, that II(//, v) is compact in the topology of weak convergence 
(see [6]). □ 

Let us provide an (obvious) condition on a linear restriction R, which is sufficient for its 
weak regularity. 

Remark 3.4. If Q C Cb(X x Y), M x C V{X) and C V(Y) are weakly closed and 

nonempty, and p, < S > v G IIfor any pair of measures p. G M x , u G M 1 , then R = 
(f2, M x ,M y ) is a weakly regular restriction. 
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Let us fix, in addition to the restriction R, some cost functional C. We define two sets of 
restricted optimal plans: 

(3.5) nf (X x Y) := {vr € U R (X x Y ): C(tt) < C( 7 ) V 7 € n fi (Pr(vr))}, 

(3.6) Uf(p, v) := {vr € v) D Uf(X x7)}. 

Proposition 3.5. If R = (fi, M x , M Y ) is weakly regular restriction, and C is weakly l.s.c., 
then IlfV, v ) i s a compact subspace of V(X x Y) (for any // E M x , zv E M 1 ). 

Proof. By definition of lower semi-continuity of a functional, its lower level sets are closed. Thus 
the set {-7r E IIr(^, u ): C{i r) < a} is closed for any a E R. Let a = inf {C(tt ): 7r E II R (/a, i/)}, 
then the corresponding level set is II^' t (/r, v). Since it is a closed subset of a weakly compact 
Polish space II r {(jl,u) (by Proposition 13.3p . it is itself weakly compact and Polish. □ 

As we see in the next sections, the assumption of weak regularity is enough to prove the 
existence of a measurable selection of optimal transport plans in the restricted set. But for the 
decomposition result, analogous to Theorem II. 11 we need to assume more. 

By the end of the section we shall assume, that R := (VL,M X , M* ) is a linear restriction, 
M x . M'* ergodic decomposable simplexes, A 0 C A, B° C B are correspondent IL-sufficient 
cr-algebras. 

The following property of linear restriction is the key one for the formulation of the main 
decomposition result. 

Definition 3.6. A linear restriction R := (fi, M X ,M Y ) is called ergodic decomposable 
linear restriction, if there exists such an ergodic decomposable simplex M C V(X x Y) and its 
correspondent //-sufficient cr-algebra (A < S > B)° C A <g> £>, that 

• n R (X xY)C M, 

• A°®B° C (A®Bf, 

• Pr. Y ( 7 ) G d e (M x ), Pry( 7 ) G d e (M y ), V 7 G d e (M). 

Let us assume, that R := (fi, M x ) is an ergodic decomposable restriction, M C V{X x 
Y) is the correspondent ergodic decomposable simplex, Qm and (.4.® B)° C A®B are the 
associated Markov kernel and //-sufficient cr-algebra. 

Let us introduce a property of linear restriction, which assures coherency of simplexes of 
marginal measures and the additional restriction on transport plans. 

Definition 3.7. A linear restriction R := (If, M x , A/ 5 ) is called coherent for M, if the inclu¬ 
sion 7T G IIr(X X Y) implies that f s cvd ?r = 0 for any u G Q, S G (M® £>)°. 

In practice, we shall use the following sufficient condition for coherency. 

Proposition 3.8. If there exists such a family { F a } of maps F a : B(X x y,d®6) —>• 
B(X x Y, A ® B), a G A, that 

(1) F a (gf) = gF a (f) for any g G B(X X Y,(A®B)°), f G B(X x Y,A<8>B), a G A, 

(2) linear restrictions R = (M x ,M y ,Q con t) and R m eas = (M x ,M y ,Q m eas) are equiva¬ 
lent, where Q. cont := span{/ - F a (f): f G C b (X x Y), a G A}, PL m eas '■= span{/ - 
F a (f): f eB(X xY,A®B), A}, 
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then R is a coherent linear restriction. 

Proof. We wish to prove that f Ig(f — F a (f))dir = 0 for any / £ C b (X x Y), S £ (,4<g>£>) 0 , 
7r £ n^(X x Y), a € A, where Is — is the indicator function of S. Since II^(X x Y) = 
n Rmeas (X x Y), /(/ -F a (f))dn = 0 for any tt £ II R (X x Y), f £ B(X x Y,A® B), a £ A. 
The statement of interest follows from the fact that f Is(f — F a (f))dn = J{I S f ~ F a (I s f))dir 
and Isf £ B(X xY, A® B) for every / £ C b (X x Y), S £ (A® B)°, a £ A. □ 

Remark 3.9. In the case = I\.(fi,v) fl M, V(y,is) £ M x x M y for some ergodic 

decomposable restriction R, it is true that R is coherent for M. Indeed, by definition, M is the 
set of all (^M-invariant measures from V(X xY). It is enough to consider F\(f) := Qm{I ), A = 
{1}, Cl := {f — Qm(/)j V/ £ Cb(X x T)}. It follows, that the restriction R := (Cl, M x , ) 

is coherent and equivalent to R. 

As we see later, the assumptions of coherency, ergodic decomposability, and weak regularity 
together imply the decomposition result for the associated Kantorovich problem. 

Let d: X x X —> M be a given distance function, Doulq C V(X) be an ergodic decomposable 
simplex, R = (O, Doulq, Dohiq) be a linear restriction. Define for each number p £ [1,+oc) a 
function Wjf: Doulq x Doing [0, oo] by the formula: 

(3.7) Wp(n,v) := inf j (^J (F(x,y)dir S j : tt £ n R (n,v) 

This function does not satisfy distance axioms in general. This motivates the following, another 
one, assumption about liner restriction, which assures to be a distance function. 

Definition 3.10. A linear restriction R = (Hq, Doulq, Doulq) is a geometric linear restric¬ 
tion for Q iff the simplex Dohiq is weakly closed, Dq C Cb(X x Y), and for every uj £ Hq it 
is true that: 

(1) ((Id,Id) # p)(u) = 0 V/r £ Dohiq, 

(2) (/j, ® v)(ijj) = 0 Vg,y £ DoniQ, 

(3) 7 t(lo) = 0 =>■ 7 t t (o;) = 0 W £ II(/r, u),Mp,, v £ Doulq, where ir T is defined as follows: 

J f (x, y)dir T = J f(y,x)dn V/ £ C b (X x Y). 

Proposition 3.11. If R is a geometric linear restriction, then W^f is an extended distance 
function. 

Proof. Since 7 r = (Id,Id)#y £ Il R (p,,fj,), f dPdrt = 0 for this 7 r, and W^(pL,p) = 0. By the 
same reason (d p = 0 only on the diagonal {(a;, a;)}, and only plans of the form (Id,Id)#fi are 
concentrated on it) W^(y,u) A 0 if n A v. Function Wjf is symmetric, because the inclusion 
7T £ IIft(/i,z/) implies ir T £ H R (v, p) and J d p dir = f d p dr: T . Triangle inequality can be proved 
using the standard technique (see, for example, [2], Theorem 2.2) with the use of the special 
version of the “gluing” lemma, that is formulated below. □ 

Lemma 3.12 (version of gluing). Let R = (S2q, Doulq, Doiuq) be a geometric linear re¬ 
striction. Then for every measure yi,pi 2 ,AD € Doiuq, 7 Ti 2 £ H R (n\, ^ 2 ), 1*23 £ Hr(/J- 2 , H 3 ) 
there is a measure 7 £ V(X x X x X), such that (Pri 2 )#( 7 ) = 7 Ti 2 , (Pr 23 )#( 7 ) = ^ 23 , 

(Pri 3 )#( 7 ) € n R (/n,/i 3 ). 
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Proof. Let us modify a proof of the standard gluing lemma (see Lemma 2.2. of 0 ). Define 
subspace V C C/,(X x X x X) as follows: 

V '■= |/l 2 (®l; X 2 ) + /23(^2, X 3 ) + ^ 13 ( 2 : 1 , X 3 ) : / 12 , /23 £ Cb(X X X), W 13 € Dgj. 

Let F(/i 2 + /23 + W 13 ) := 7112 (/ 12 ) + Check that F is well-defined on V. Consider 

two representations of some element of V: /12 + /23 + W 13 = /12 + /23 + D 13 . Note, that 
UJ 13 (X 1 ,X 3 ) - un 3 {xi,x 3 ) = Wi(xi) + 0 J 3 (x 3 ) for some ui,u 3 <E Dg. Then f 12 - /12 + wi = 
f 23 — /23 — W 3 , and therefore both parts of the equality depends only on x' 2 - Since 

7r 12(/l2 — fl2 + Wi) = H 2 (/l2 — /12) = h2(f23 ~ /23) = ^23 (/2 3 ~ /23 — w 3)i 

the map F is well-defined on V. It is easy to check, that F is a bounded positive linear 
functional. The correspondent version of Hahn-Banach theorem (Theorem 1.25 of [1]) states, 
that such a functional can be extended to a positive bounded functional on the entire Cb(X x 
X x X). Since on the subspaces Cb{Xk) value of F coincides with the integration w.r.t. 
measures /j,k, one can apply Rietz theorem (Theorem 7.10.6 in [5]) to the extension of F. The 
resulting measure will satisfy all the required properties from the statement of the lemma we 
are proving. □ 

Proposition 3.13. Geometric linear restriction is weakly regular. 

Proof. Since (ji <g> v)(ui) = 0, fi <g) v G Ll^(^, v) for any £ Domq, Doing is weakly closed 
and C Cb{X x Y), it follows that R = (D, DoniQ, Doing) is weakly regular (see Remark 

m • □ 

Let us check that for our main examples (I2.20l and l3.14l of invariant and stationary measures 
respectively) it is possible to define additional linear restrictions, such that all (or most of) 
properties introduced above are satisfied. 

Example 3.14 (Main example). Let us consider the linear restriction: R = (fi, Dom, Dom), 
where Dom is a simplex of all invariant measures as in Example 12.201 

n := span ({/ - fog: V/ € C h (X x X),\/g € G}). 

It corresponds to a restriction on transport plans to be invariant w.r.t. a diagonal action of G 
on X x X (see m , Prop. 5.1 for the proof): g{x,y) = (g(x),g(y)). 

Let M = Vc,(X x Y) be a simplex of all invariant measures on X x Y w.r.t. this action, 
(A®B) inv be a cr-algebra of all invariant Borel subsets on X xY. It is clear, that Iljj(X X L) C 
M, A mv 0 B inv C (4® an( j 0 BY™ is the F-sufficient subalgebra associated to M. 
To show ergodic decomposability of R (Definition 13.61) , we need to check, that every ergodic 
measure from P&(X x Y) has ergodic marginals, which belong to Va(X). It can be easily 
shown by contradiction. 

Let us prove that the restriction is geometric (Definition 13.101) : 

(1) the set Vq(X) is closed and D C C\(X x X). 

(2) (Id,Id)#fj,{f(x,y)-f(g(x),g(y))) = fj,(f(x,x)-f(g(x,x))) = n{f{x,x))-g # y{f(x,x)) = 
0 if fi£Dom = V G (X), f € C b {X x X ), g G G, g(f) := f fdg, 
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(3) note that 

(a* ® y ) - f(g(x),g{y )) = J (^J f(x, y) - f(g{x),y)dy{x)^j dv(y)+ 

+ J( y J f(g( x ),y ) - f{g(x),g{y))dv{y)^J dy{x) = 0, V/i, // G Dom = Vg{X), f £ C b (XxX), g £ 

(4) since the inclusion f(x,y) - f{g(x),g(y)) £ D implies f(y,x ) - f{g(y), g(x)) £ for 
any f £ C b (X xX), the last requirement from the definition of geometricity is satisfied. 

We also need to check, that the restriction is coherent w.r.t. M (Definition 13.71) . Note, that 
II v) = !!(//, v) fl V(X x Y), V(/it, v) £ M x x M 5 , and use Eernark 13.91 

Remark 3.15. For the example with a simplex of invariant measures Dom (Example 12.201) 
it is also possible to consider other meaningful and well-behaved linear restrictions. Assume 
that there is a given action of group G on X. One can consider the action of direct product of 
groups G © G of X x X defined in the natural way: (<7i, £ 2 ) '■= (gi(x\), 52 (^ 2 )) ■ 

Let us fix a subgroup El C G © G with the induced action on X x X. If the projections of 
the subgroup on the first and second components coincide with G, the associated restriction 
R = (D, Dom, Dom) with 

fl := span({/ — / o h: V/ € C &(.X x X),Vh £ H}). 

has the properties of ergodic decomposability and coherency, defined in this section. It can be 
checked in the same way we do for the diagonal action of a group. 

Example 3.16 (Discrete-time Markov process). Let simplex Dom = Vq{X) be defines as in 
Example 12.211 and let be its associated er-subalgebra. Consider Markov transition kernel 
Qm from A <S> A to A 0 ( 8 > A 0 that is defined by the formula 

QM(f)(x,y) ■= j f{x,y)dQ x {x)®Q v (y). 

It can be checked, that Qm^/Qm^)) = Qm(/)Qm( 9'), and, therefore, by Theorem l2.16l there is 
an associated ergodic decomposable simplex M. Let us consider the following linear restriction 
R = (0, Dom, Dom), 

D := span({/ - Q M (f) ■ V/ £ C b (X x X)}). 

It is ergodic decomposable and coherent w.r.t. M (the arguments are analogous to the ones 
from the previous example). 

4. Measurable selection of optimal transport plans 

The goal of this section is to prove the existence of a measurable map / : M x x —> 
Il)f (X x Y), such that Pr(/(/r, v)) = (/U,u) (recall, that Pr(-7r) := (Prx(7r), Prx(7r))), under 
the assumptions of weak regularity of R and lower semi-continuity of cost functional. Existence 
of such a map is required in the proof of the decomposition result, which we shall formulate in 
the next section, and it seems to be of interest itself. 

In the Kantorovich problem without additional linear restriction this fact is well-known (see 
Corollary 5.22 of (26]). Its proof relies on the sufficiency result: c-cyclical monotonicity of a 
support of a transport plan implies its optimality. There is no known analogue of this result 
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in the case of the problem with additional restriction. Thus we need to invent a more direct 
way to prove measurability. 

Our main tool is the following theorem of Rieder: 


Theorem 4.1 (Rieder [22], Th. 4.1, Cor. 4.3). Let (0,.A), (S, B) be Polish spaces with Borel 
cr-algebras, DC0xS,u:fl->lU {Too}, 

L c := {(0, s) £ D : u(9, s ) < c}, c € R, 

L +oc := D. If Vc £ (—oo, +oo], L c £ A® B, and Vc £ R 

L c (9) '■= L c D {(0, s) : s £ 5} 

is compact for any 9 £ Pre(Zl), then there exists a measurable function / : Pre(Zl) —> S, such 
that 

u(9,f(9)) = inf u(9, s), \/9 £ Pre(D) 

seD(0) 

where D{9) := Prg 1 (0) n D. 

Let 0 := M x x M y , S := IIr(X x Y). be equipped with the topology of weak con¬ 
vergence and corresponding Borel cr-algebras. Both 0 and S are Polish spaces, and due to 
weak regularity of R, S is closed. Let D := {(p,u, 7r) : n £ II r(p,v), (p, v) £ M x x M y }, 
7 r) :=Cost(ir). Then 

L c = {(p, v, i r) : Cost{n) < c, n £ ^(/i, v), (p, v) £ M A ' x }, c £ (—oo, +oo], 

L c (p, v) = { 7 T : Cost(ir) < c, 7 r £ Ll^(/r, ^)}, c £ R 
To apply the theorem above, we have to prove that L c is Borel, and that L c (p, v) is compact. 
We are going to use the following proposition (Himmelberg [15], Theorem 3.5). 


Proposition 4.2. Let (0,^4), ( Y,B) be separable metrizable spaces with Borel cr-algebras. 
Let T : 0 — > 2 s be a map with values in closed subsets of S, such that for every closed subset 


V C Y 


{9 : T(9) nP^0}£4 


Then the graph of T: 
is in A < 8 > B. 


{(0,s) : 9 £ 0, s £ T{9)} 


Lemma 4.3. If R is weakly regular and Cost : L1r(A' x Y) — > R is l.s.c., then the set 
L c := {(p,v, tt) : Cost(ir) < c, 7 r £ IIr(^, u), {p,v) £ M x x M 5 }, c £ (—oo,+oo], 
is an element of a -algebra Bor(M A x M y ) <g) Bor(IIij(X x Y)). 

Proof. By definition of weak regularity, L1^(X x Y) is a Polish space, Hr{p,v) is Polish and 
compact. Since Cost is an l.s.c. functional, its lower level sets II C := {tt £ V(X x Y) : 
Cost(n) < C } are closed. Since L c (p,v) = II n(p,v) 0 II C , L c (p,v) is compact. It should be 
noted, that L c (p, u) can be empty, but empty set is compact, hence there is no contradiction 
here. 

Let (n,A) := ( M x x M y ,Bor(M x x M y )), (S,B) := (U R (X x T),Bor(n R (X x T))), 
T : M x x M 5 —► 2 IlR ( Xy ' Y \ T(p,u) = L c (p,v). Then L c = {{p, u, tt) : 7 r £ L c (p,u), (p,v) £ 
M x x M 5 } is the graph of T. Note, that T has compact (hence closed) values. 
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Fix an arbitrary closed set V C Hr(A x Y). Let us show that the set 

T~\V) := {{p,v) : L c (jjl,v) fl V ^ 0} 

is closed, hence Borel. If the set is empty, it is trivially true. Assume it is not. Let {p n , u n ) £ 
T” 1 (F), n £ N, be a sequence converging to (p, v) in M x x M 5 . Then p n —> p, u n —> u weakly, 
and the sequences (p n ), {v n ) are tight. But for each (p n , v n ) there exists a 7 r n £ L c (p n , u n ) fl V. 
It follows, that Pr( 7 r n ) = (p n , v n ), and the sequence (n n ) is tight (by considering products of 
compact sets). Let (vr nj .) be a weakly convergent subsequence with limit tt. Since V is closed, 
it is sequentially closed, and ir £ V. Due to weak continuity of Pr and weak l.s.c. of Cost, it is 
clear, that Pr( 7 r) = (p, v), Cost(rr) < c. It follows by weak regularity of R, that the functional 
7 T —>• f ivchr is weakly continuous for any u £ fL Hence f ujchr = 0, and tt £ L c (p, v). 

We obtain, that for (p,v) there is a n £ L c (p,u) n V, hence (p,v) £ T~ l (V), and T~ l {V) 
sequentially closed. Since M x x M y is metrizable, sequential closeness implies closeness, and 
T~ 1 (V) is closed, hence Borel. By Proposition 14.21 the graph of T, which coincides with L c , is 
a Borel set. It concludes the proof. □ 

We are ready to formulate and prove the main result of this section. 

Theorem 4.4. Let X, Y be Polish spaces with Borel cr-algebras. If R = (fi, M x ) is a 
weakly regular linear restriction, Cost is a weakly l.s.c. cost functional, then there exists a 
measurable map / : M x x M 5 —> x Y), such that Pr(/(/x, u)) = {n,v). 

Proof. Let us apply Theorem 14.11 Recall, that © := M x x M 5 , S := n^(A' x Y) are 
Polish spaces with Borel cr-algebras, D := {(//, z/, tt) : ^r £ Hr(h, u), (tpv) £ M x X M y }, 
u({n,v), 7r) := Cost{ tt), L c = {(/j,, u, tt) : Cost(ir) < c, n £ Hr(p., v), (/c, v) £ M x x } is 
Borel for every c £ (—oo,+oo] by Lemma 14.31 L c (pL,v) = {7r : Cost( tt) < c, tt £ Hr(p, it)} 
is compact for every c £ M by weak regularity of R and lower semi-continuity of Cost. 
It follows directly from the application of Theorem 14.11 that there exists a Borel function 
/ :M X x M y -> L1 r (A x Y), such that 

Cost(f{n, z/)) = inf Cost( tt), 

which implies, that f(fx, v) £ 11°^ (//, u) for any (fj,, v) £ M x x M y . □ 

5. Decomposition of Kantorovich problem 

In this section we formulate and prove an analogue of Theorem 11.11 for the restricted Kan¬ 
torovich problem under the assumptions of weak regularity, ergodic decomposability, and co¬ 
herency of linear restrictions. 

Let (A, A), (Y, B) be two Polish spaces with Borel cr-algebras, M x C V(X) and M y C V(Y) 
be two closed ergodic decomposable simplexes. Denote via Qx and Qy the Markov kernels on 
(A, A), ( Y,B), associated with M x and M y , such that d e (M x ) = { Q x x }, d e (M' > ) = {Q\-}. 
If C A and B° C B be their corresponding iL-sufficient subalgebras, then, by dehnition, 
Qx and Qy are decompositions for triples (A, A 0 , M x ) and (B, B°,M } ) respectively. We 
shall use the notation: £(cc) := Q x x , rj(y) := Qy. Note, that the maps f: X —> d e (M x ), 
rp. Y —> d e (M y ) are A 0 - and H°-measurable respectively. 
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Let R = (17, M x , ) be weakly regular , ergodic decomposable and coherent w.r.t. M linear 

restriction, where M C V(X x Y) is the corresponding ergodic decomposable simplex, Qm 
and (A ® B)° C A ® B are the associated Markov kernel and IL-sufficient cr-algebra. 

Definition 5.1. Denote by tl(p, i>) the set of all probability measures i f £ V(X x Y, ( A®B )°), 
such that Prx(fr) = /2, Pry^) = u). 

Definition 5.2. Let p £ M x . u £ M 5 . Define the set Q(R,p,v) as the set of all pairs 
(n, Qtt), where tt £ TIr(p, v), and Q n is such a Markov transition kernel from (X x Y,A® B) 
to (X x Y, (M ® £>)°), that Qniit) = n and £ LI r(Qx,Qy) f° r fr-a.e. (x,y) £ X xY. 

Here i f is the restriction of measure it from A® B to (A ® H)°. 

In the following Lemma we use properties of ergodic decomposability and coherency of the 
linear restriction. 

Lemma 5.3. Under the assumptions made above, for any it £ Hr(p, u) there exists a Markov 
transition kernel Q n , such that (tt, Q n ) £ Q(R,p,n). 

Proof. For each measure tt £ H(p, u) define measure tt as the restriction of tt on (A ® B)°. 
Its marginals, Pry (tt) and Pry (-it), are the restrictions of p and v to subalgebras A 0 and £>° 
respectively. Hence, the inclusion tt £ Tl(p,v) implies that tt £ n(p,h) (see Definition 15.111 . 

Let Q tt := Qm, where Qm is the Markov kernel from the definition of ergodic decompos¬ 
ability of restriction R. This property implies Q(tt) = tt and £ V(X x Y). Moreover, 

Pr (Q^ ,y) ) = (£(x),ri(y)) for fr-a.e. (x,y). 

Let us check that ff-a.e. Q^’ y \uj) = 0 for each function cj € 17. Let us use the following 
equality: 

(5.1) J hdQ^ y) = E n (h\(A®B)°) vr-a.e. Vh € B(X xY,A®B). 

By coherency of R and the definition of conditional expectation, we obtain: 

J (^J udQi x ’ y) ^J dn = J urdTT = 0, VS £ (A® B)°. 

Hence, f udQ^’ y ^ = 0 for 7r-a.e. (x,y). 

As we just proved, Q^’ y ^ £ IIr(£(x), rj(y)) for ff-a.e. (x,y). Since the map (x,y) —> Q^" v) is 
(M®H)°-nreasurable, Q n is a Markov transition kernel from (X xY,A®B ) to (X xY, (A®B)°). 
It follows from (15.11) . that 

J QM)dn = J fdrr, V/ £ B(X x Y, A ® B), 

which implies (Tr,Q n ) £ Q(R,p,v) and concludes the proof. □ 

Let c: X X Y —> R be such a measurable function, that the functional Cost: V(X xf)-) 
R U {+oo}, being defined as Cost( tt) = f cdn , appears to be lower semi-continuous w.r.t. 
topology of weak convergence. 

Lemma 5.4. Under the assumptions made above, there exists a Markov transition kernel Q op t 
from (X x Y, A ® B) to (X xY,(A® B)°), such that 
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(1) JcdQffi =inf{/cd 7 r: vr <E n fl (£(x), y{y))} M{x,y) £ X xY, 

(2) for any pair {ir, Q n ) £ 0(R, pi,is), f cdQ op t{ir) < f cdQ n {it), where ir is the restriction 
of 7 T from A <S> B to (A <S> B)°, 

(3) for each measure ir £ Tl{fi,is), {Q op t{H), Qo P t) £ ©(-R, y, is), where y is the restriction 
of y from A to A 0 , is is the restriction of is from B to B°. 

Proof. By Theorem 14.41 there exists a measurable map /: M x x M y -£ U oyt {X x Y), such 
that Pr{f{y, is)) = {/a, is). Let Q^ p y) := f {£{x), rj{y)). Since x ->• f{x) and y -s> rj{y) are 
measurable maps w.r.t. algebras A 0 and B° respectively, Q op t is in fact a Markov transition 
kernel from ( X x Y, A® B) to (X x Y,A° ®B°) and, consequently, a Markov transition kernel 
to (A 0 B)°. It follows by definition, that 

J c d Q ( o P t V) = inf {/ cd'y: 7 £ n R (£(x), ??(y))| . 

Let £ Q{R,y,is). Then f cdQ 7T { tt) := / cdQ^'^'j dir. Since Q < f’ v} £ U R {£{x),r]{y)) 

7 r-a.e., f cdC/f r ’ y ' > > f cdQ^f* ff-a.e., which implies f cdQ n {ir) > f cdQ op t{ir). 

Let 7 r £ ti{y, is), where y, D are the restrictions of measures y £ M x , is £ to A 0 , £>° 
respectively. By definition of Markov transition kernel, Q op t(x) £ V{X x Y). Let us check 
that Q op t(H) £ II R {y,is). Since Qo P t £ nR(£( x )> v(y)) f° r every pair (x,y) £ X x Y, it can be 
shown, that Pr(Q op t( 7 r)) = {pi,is). Let us provide an argument for the first marginal: 

(5-2) j J f{x)dQ ( * p f{x,y ) dir(x,y) = J j f {x)dQ x x {x)dir{x, y) = 

= JJ f {x)dQx{x)dy{x) = J f{x)dy, V/ € B{X,A), 


where the last equality follows from the fact, that Qx is a decomposition for {A,A°,M X ). 
Analogously, it can be checked for the second marginal. It is clear, that if f uidQ' ( : )pt y ^ = 0 for 

any (x,y), it implies the equality: f f ujdQ (xy> dir = 0 . Thus Q op t{H) £ II R {y,is), and, hence, 
there exists a measure rr £ Y R {y, is), such that n = Q op t{^) £ H R {y, is). □ 


Let us formulate and prove the main result. 

Theorem 5.5 (Main theorem). Let (X,A), (Y,B) be two Polish spaces with Borel u-algebras; 
c: X X Y —> R be such a measurable function, that functional Cost: V{X x Y) —> MU{+oo}, 
defined as Cost(ir) = f cdn , appears to be a lower semi-continuous functional w.r.t. topology 
of weak convergence; M x , M 5 be two ergodic decomposable simplexes, M° C A, B° C B be 
corresponding cr-subalgebras; R = {ft, M x , M Y ) be a weakly regular, ergodic decomposable, 
and coherent w.r.t. M linear restriction, where M C V{X x Y) is the associated ergodic 
decomposable simplex, Qm and {A 0 B)° C A® B are the associated Markov kernel and 
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insufficient a- algebra. Then 

(5.3) inf < / cdir: ir £ TLr{h,v) = inf <! I Q opt (c)dft: ft G II(/2, v) = 


= inf | J inf | y edit: tt £ j dn: ft£ll(fl,u) 

Proof. Let 

®opt (-Rj al •— {(Qopt (f^)j Qopt ) • g n(/f,fi)}, 

where Q op t is as in Lemma 15. 4[ By this Lemma, @ op t(R, n,u) C @(R, thus, 


inf | y Q n (c) dft: (ir, Q n ) £ @(R, fi, v) j < inf jy Qop t {c)dft : 7i G II(/i, z?) 

As follows from Lemma 15.31 

inf | J cdir: ir £ Hr(h, zx)1 = inf Q w (c)dft: (ir, Q n ) £ @(R, //, n)| . 

By Lemma 15.41 J cdQ opt (ft) < f cdQ n (ft) for all (ir,Q w ) £ @(R, p, v). Hence, 

inf j Q n (c)dft: (ir, Q n ) £ @(R, /i, z/)j > inf | J Q opt (c)dft: ft £ n(/2, v) 

Thus, we conclude 

inf | j cdir: ir £ Hr(r, v)\ = inf | j Q opt (c)dft: ft £ ft (/I, n)| . 

The equality 

inf | j cdir. ir £ Hr^r,, i/) 1 = inf <| j inf | j cdir: ir £ Hr(£(;c), r/(y))| dft: ft £li{fi,v) 
follows from the explicit form of Q op t, described in Lemma 15.41 □ 


6. Decomposition of Wasserstein-like distances 

In this section we discuss an applications of the decomposition theorem (Theorem 15.5[) to 
the theory of Wasserstein-like distances. 

Let d: I x I A R be a given distance function, Doing C V(X) be an ergodic decompos¬ 
able simplex. Recall, that by Wasserstein-like distance we mean the function W p : Doing x 
DoniQ —x [0, oo], defined by the formula (13.71) . 

For any [0, +oo]-valued distance function d on the set <9 e Domg of extreme points of Doing 
define an extension of this function to the entire Domg: 

(6.1) d p {R,v) := inf j (^J d p {f,r])dir: ^ 7T G n(/i, z/) j , 

where /4, v £ P(<9 e Domg), := {ir £ V(d e DouiQ x cl e Doing) : Pit(7t) = /z,Pr 2 (7r) = D}, 

bar(//) = /j, bar(n) = is. 

We are able to formulate the following statement. 
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Theorem 6.1 (Decomposition of a restricted Wasserstein distance). Let (X,d) be a metric 
Polish space equipped with Borel er-algebra A, DoniQ C V(X) be a closed ergodic decom¬ 
posable simplex with the associated Markov kernel Q, such that {Q x } = 5 e Domg, A° be 
a corresponding //-sufficient er-subalgebra, R = (D, DoniQ, Doiuq) be a geometric , ergodic 
decomposable , and coherent linear restriction w.r.t. M, where M C V(X x Y) is the corre¬ 
sponding ergodic decomposable simplex, Qm and (A(g>£>)° C A®B are the associated Markov 
kernel and //-sufficient c-algebra. 

Then for d := Wp |a e DomQ (the restriction of Wp to the set <9 e DoniQ) it is true that 

Wp(n, v) = d p (n, v), V/r, v 6 Doiuq, 

where d p is dehned by the formula (16.11) . Moreover, W^ is an [0, +oo] -valued distance function 
on DoniQ. 


Proof. Since R = (D, Doing, DoniQ) is a geometric restriction, it follows that it is weakly 
regular. It is known, that dP is bounded below and is a lower semi-continuous function on 
X x X. Hence tt —> f dPdit is a weakly lower semi-continuous functional on V(X x X) (w.r.t. 
topology of weak convergence). Since the hypothesis of Theorem 15.51 is satisfied, one can apply 
it to obtain 


inf < / d p dn: tt G v)>= inf < / inf < / d p dn: n £ n^(^(x), r){y)) > drr. it G n(/i, v) 


Here n(/i, D) is as in Definition 15.11 Since 

(6.2) (x,y) -)■ |y d p d-K: tt € U R (f,(x), ?y(y)) 

is measurable w.r.t. A 0 <8> H°, and A 0 <S> 13° C [A <S> £>)°, it is true, that one can replace LI(/2, v) 
with v) without any change of the inhmum: 

inf | J d p dn: tt € n^(/x, u) 1 = inf inf | J dfdTr: tt S n^(^(x), r/(y))| dir: it € H(/i, P) 

By the definitions of the distances d p , Wp and due to the established fact about equivalence 
of cr-algebras on the space of measures /Proposition 12.191) . the obtained equality is equivalent 
to the equality 

W* = d p . 

It follows from Proposition 13. Ill that v) is actually [0,+oo]-valued distance function on 

Doiuq. □ 


In can be noted, that the example of decomposition described in the introduction section 
appears to be a particular case of the just proved statement. 
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